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ABSTRACT 



Aims. We investigate the statistical properties of the extreme events of the solar cycle as measured by the sunspot number. 
Methods. The recent advances in the methodology of the theory of extreme values is applied to the maximal extremes of the time 
series of sunspots. We focus on the extreme events that exceed a carefully chosen threshold and a generalized Pareto distribution is 
fitted to the tail of the empirical cumulative distribution. A maximum likelihood method is used to estimate the parameters of the 
generalized Pareto distribution and confidence levels are also given to the parameters. Due to the lack of an automatic procedure for 
selecting the threshold, we analyze the sensitivity of the fitted generalized Pareto distribution to the exact value of the threshold. 
Results. According to the available data, that only spans the previous ~250 years, the cumulative distribution of the time series is 
bounded, yielding an upper limit of 324 for the sunspot number. We also estimate that the return value for each solar cycle is ~ 188, 
while the return value for a century increases to ~ 228. Finally, the results also indicate that the most probable return time for a large 
event like the maximum at solar cycle 19 happens once every ~ 700 years and that the probability of finding such a large event with 
a frequency smaller than ~ 50 years is very small. In spite of the essentially extrapolative character of these results, their statistical 
significance is very large. 

Key words. Methods: data analysis, statistical — Sun: activity, magnetic fields 



1. Introduction 

When analyzing a given physical process, a large amount of ob- 
servations opens the possibility of applying the power of statisti- 
cal techniques. The well-developed field of statistics has devised 
a panoply of methods that allow us to infer properties from the 
observed phenomena. Sometimes, these statistical methods are 
so powerful that one can extract statistically significant informa- 
tion from noisy or a reduced set of observations. 

One of the most striking examples of this is the case of ex- 
treme events. In spite of their inherent rarity, extreme events 
sometimes play important roles and turn out to be of fundamen- 
tal importance. In certain fields (analysis of precipitation and 
floods, maximum temperatures, global climate, etc.), extreme 
events are those that produce radical and serious changes. For 
this reason, the extreme value theory is well developed in these 
fields and has been applied during the last decades with great 
success. 

We can find scarce applications of the theory of extreme val- 
ues in the field of Astrophysics (e.g..lBhavsar & BarrowU l985; 
iBhavsati Il990t iBernstein & Bhavsarl |2001|) . This situation is 
somewhat surprising because usually the most interesting events 
that astrophysicist study are the most extreme ones. In this work, 
we will apply the theory of extreme values and their most recent 
advances to the investigation of the solar activity cycle. One of 
the best-known indicators of solar activity is the sunspot rela- 
tive number. This indicator closely follows the 1 1 -year solar ac- 
tivity cycle and has been continuously tabulated from ~1750. 
During the last decades, there has been an increasing interest 
in the prediction of the upcom ing solar cycles and different tech- 
niques have been applied (e.g ., Li et all 1200 UlOrfila et al.L l2002; 
Dikpati & Gilman, 2006; Du, 200f$7^ie main cause for this in- 
terest has to be found, not only on the pure scientific curios- 
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ity of knowing in advance the amplitude of the following solar 
maximum, but in the influence of a strong (or weak) solar max- 
imum on the interplanetary medium. A strong solar maximum 
can induce solar storms that can damage the enormous amount 
of satellites around the Earth. Another reason has to be found 
on the feasibility of a tripulated mission to Mars. The required 
long journey has to be carried out away from strong solar max- 
ima in order to minimize the exposure to dangerous doses of 
radiation. Our work aims at analyzing the statistical properties 
of these extreme activity events so that their frequency and am- 
plitude can be estimated. This could serve to gain some insight 
on the efficient amount of shielding needed to protect satellites 
and/or tripulated missions. 

For this reason, having the ability to predict when such ex- 
treme solar cycle events would happen is of interest. A few 
works have been oriented to wards the inves tigation of the statis- 
tics of such extreme events. Sisco e ( 1976a, b) used the theory of 
extreme value t o analyze the largest sun spot number per solar 
cycle. Later on, IWillis & Tulunayl (1 19791) extended the previous 
works by analyzing the data from sunspots umbrae, complete 
sunspots (umbrae+penumbrae) and faculae during nine solar cy- 
cles. Apparently, this kind of analysis has not been repeated dur- 
ing the last 25 years, where more than two solar cycles have 
occurred. Additionally, the previous works were also based on 
older approaches to the statistics of extreme values. 

We extend in this paper the previous analysis of the solar cy- 
cle based on the monthly variation of the sunspot number. The 
interest of this work resides in the application of the more recent 
techniques to infer the statistical properties of the extreme val- 
ues of the activity cycle. Our purpose is to give clues that help us 
forecast the extremes of the solar cycle and their occurrence fre- 
quency for extended periods of time (not only in the future, but 
also in the past). In more detail, once the probability distribution 
function of extreme events (largest number of sunspots) is char- 
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acterized, we investigate whether this distribution is bounded or 
not and which are the typical events that we can expect for a 
given amount of time. 



2. Extreme Value Theory 



It is well-known that the Central Limit theorem (i.e., iFelleri 
1197 lb states the asymptotic distribution that a sum of identically 
distributed random variables with finite variances will follow 
when the number of these variables is sufficiently large. What 
is less known is that a similar theory exists for the distribu- 
tion of the maximum values taken by a random variable. This 
apparent lack of awareness has to be found in the fact that, al- 
though the theoretical formalism is known from several decades, 
it has taken a long time to devise practical methods to apply the 
formalism to real data. As well as the Central Limit theorem 
deals with the functional form of the part of the distribution with 
largest probability (where the majority of the events occur) when 
a sufficiently large number of random variables with finite vari- 
ance are summed, the extreme value theory deals with the tails 
of such distributions. As a consequence, since by definition very 
few events occur in the tails of the distribution, this lack of ob- 
servables transforms the estimation of the probability tail a very 
difficult task. Fortunately, statisticians have developed efficient 
methods that can make use of the few events in the tails to esti- 
mate the statistical properties of such extreme events. 

Two different approaches are fundamentally used for the 
analysis of extreme events. The essential difference is in the way 
extreme events are defined. Let {X,, i = 1 . . .n] be a sequence 
of independent random variables that have a common distribu- 
tion function F. Each X, can be considered as a measure of a 
random process taken with a certain timestep At. The first ap- 
proach, termed the block maxima approach dFisher & Ti ppett. 
Il928t IColesl 12001). takes as extreme events the maximum (or 
minimum) value of the random variable in fixed intervals of 
time. For instance, if the timestep is considered to be one hour, 
the maximum among 24 consecutive measurements is the daily 
maximum. The second ap proach, term ed the peaks over thresh- 
old (POT) approach fe.g.. lColesLl200lh . takes as extreme events 
all the values of the time series that excee d a given thre sh- 
old. The first approach w as the one taken bv lSiscod dl976alfbh ; 
IWillis & Tulunav (1979) for the analysis of extreme values in 
the temporal evolution of sunspot numbers. It has been selected 
traditionally for the analysis of time series in which a clear sea- 
sonal (periodic) behavior is detected. However, in spite of the 
theoretical simplicity of the block maxima approach, it suffers 
from important drawbacks. One of the most important limita- 
tions is that it tends to make an inefficient use of the data. The 
importance of the necessity to overcome this lack of efficiency 
lies in the fact that the extreme value theory deals with extreme 
events, which are, by definition, scarce. For this reason, POT is 
becoming the method of choice in recent applications of the the- 
ory of extreme events, fundamentally for the efficient use of the 
reduced amount of data available. 

We briefly present the theoretical results that we use in this 
paper. Assume that we measure a random variable at constant 
time intervals and that we obtain a sequence [Xj, i= 1 . . . n). The 
measurements have to be independent and they have a common 
distribution function. Then, the sequence can be described with 
the aid of the cumulative distribution function F. Since we are 
focusing on extreme events, we will be interested in the tail of 
such distribution. The POT approach is based on analyzing what 
is known as the conditional excess distribution function, F u (y), 
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Fig. 1. Solar cycle variation of the International Sunspot 
Number. In order to decrease the dispersion, we focus on the 
monthly averaged value. 



defined as: 



F u (y) = P(y>X-u\X>u), > y < oo, 



(1) 



where X is the random variable, u is the threshold used to distin- 
guish the maximum values and we have assumed (for simplic- 
ity) that X can take infinitely large values. This F u (y) function 
therefore describes the cumulative probability that, given a value 
of the random variable larger than the threshold, it exceeds the 
threshold by a quantity y. In the case that the complete cumula- 
tive distribution function F is perfectly known, F u (y) would also 
be known. In realistic applications, the cumulative distribution 
has to be empirically estimated and the tail of the distribution is 
often poorly sampled. For this reason, the estimation of F u (y) is 
usually not possible or uncertain. Theoretically, there is a rela- 
tion between F and F u (y): 



F u (y) 



F(u +y)- F(u) 
1 - F(u) ' 



y > 0. 



(2) 



The feasibility of the PO T approac h finds its roots in the pow- 
erful theoretical result by Pickandsj (1 19751) . who derived that, for 
a very large class of underlying F distribution^ the conditional 
excess distribution function is well approximated by the gener- 
alized Pareto distribution (GPD): 



F u (y) 



■US 



iff ^0 
iff = 0, 



(3) 



with y € [0, oo] if f > and y e [0, -cr/f ] if f < 0. The GPD is 
a general cumulative distribution function that is able to model 
the behavior of different tails depending on the exact value of the 
parameters. The quantity f gives information about the shape, in 
particular, the "strength" of the tail. An exponential-type dis- 
tribution (normal, exponential, log-normal) is found for f = 0, 
a bounded beta-type distribution (beta or uniform) is found for 
f < (zero probability is assigned for events above a certain 
limit) while a heavy-tailed Pareto-type distribution (power law, 



1 According to Pickands ( 1975), the class of distributions F that ful- 
fill the theorem are those whose (block) maxima follow one of t he three 
families of extreme value distributions (Fisher & Tippetti ri928f) . These 
are the Gumbel, Frechet or Weibull distributions. 
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Pareto, Cauchy) is found for £ > (the tail falls slower than an 
exponential). The quantity cr gives information about the scale 
of the distribution. Although not applied in this work, it is impor- 
tant to remind that there is a duality between these distributions 
and the extreme value Gumbel, Frechet and Weibull distributions 

of the block maxima approach. 

With the aid of the theorem developed by Pick andsl d 19751) . 
the functional form of the cumulative distribution function for 
events above u can be written. Making x - u+y and solving for 
F(x) in Eq. d2}, the cumulative distribution function for events 
above the threshold u can be written as: 



F(x) = 1 - 



N„ 



1 H (x — u) 

cr 



-l/f 



(4) 



The previous expression assumes that the value of the cumula- 
tive distribution at u is given by the estimation (« - N u )/n (the 
so-called historical simulation), with n the total number of points 
in the time series we are analyzing and N u the number of points 
above the threshold. This estimation is expected to be accurate 
if the threshold is high enough. Obviously, the functional form 
described by Eq. (0]i is only valid for x > u. Once the function 
F is known, all the statistical properties of the extreme events 
can be calculated. One of the most interesting statistical proper- 
ties is the so-called "return time". This is defined as the typical 
time that one has to wait until an event of amplitude x ret happens 
again. It can be estimated from Eq. © by setting x = x let and 
making the identification f~| = 1 - F(x Kt ). The units of this time 
variable depend on the units of the timestep At. 

The parameters of the GPD are usually obtained from the 
empirical data by means of a maximum likelihood estimation. 
Assuming that y i , yi, ■ ■ • , y^ are the N values of the origina l time 
series that exceed the threshold u, the log-likelihood is IColesI 
J2001 : 

1 + i)Z log ( 1 + ?)- (5) 

Since the value of the cr and £ parameters that maximize the 
log-likelihood cannot be found analytically, the l{cr,^) function 
is m aximized using st andard numerical optimization methods 
(e.g JPress et all[l986l) . Once the maximum likelihood value of 
the parameters are found, it is possible to calculate confidence 
intervals. A standard method relies on the assumption that the 
likeli hood functio n is given approximately by a normal distribu- 
tion [Colli d2001l) . Under this assumption, the confidence inter- 
val can be estimated with the aid of the estimated curvature of 
the likelihood function, which is proportional to the Hessian ma- 
trix of Eq. <j5j . A more refined method, and the one that we use in 
this paper, is to use the information from the likelihood function 
itself. This allows us to give asymmetric confidence intervals in 
the parameters due to the skewness of the likelihood function. 



3. Application to the Solar Cycle 

The time series representing the solar activity cycle during the 
last 257 years is shown in Fig. Q] The data represents the 
sunspot number, that is an estimation of the number of individual 
sunspots (counting individual sunspots and group of sunspots). 
The data has been tabulated since 1750 and it is nowadays known 
as the International Sunspot NumbeiQ- The time series presents 
a clear regularity with time as a consequence of the influence of 
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Fig. 3. Value of the cr and £ parameters obtained for different val- 
ues of the threshold. When the threshold is increased, the stan- 
dard deviation of the errors in the parameters increases due to 
the decrease in the number of available extreme values. On the 
contrary, when the threshold is decreased, the error bars of the 
parameters decrease. However, a trade-off has to be found so that 
the threshold is made large enough so that the GPD is a good ap- 
proximation of the empirical distribution but not too large so that 
the parameter estimation is corrupted by poor sampling effects. 
An intermediate value of 149.4 (4% of the points lie above this 
value) seems to represent a good compromise. 



the solar cycle on the surface magnetism. Under these circum- 
stances, the premises on which the POT formalism lies are not 
fulfilled. In particular the random variables are not independent 
because there is a certain degree of correlation between consec- 
utive events: a large sunspot number is typically followed by 
another large sunspot number. Several techniques have been de- 
vised to overcome this difficulty. One of the most applied ones 
is the "de-clustering" of the time series (e.g JColesLl200l . The 
method consists on locating clusters in the excedance over the 
threshold and representing them by the maximum value inside 
each cluster. This has two undesired consequences: (i) the num- 
ber of events available for the GPD analysis is reduced and (ii) 
a somewhat arbitrary criterion for the clus t er defi nition has to 
be included. Recently, iFawcett & Walshawl d2007l) have shown 
that this de-clustering technique introduces biases in the max- 
imum likelihood estimations of cr and They also show that 
the direct application of the POT method using the whole time 
series neglecting any temporal periodicity leads to negligible bi- 
ases. The price to pay is that the confidence intervals for the 
GPD para meters are lar ger than th ose obtained using s t andar d 
techniques lColesI d2001l) . Following lFawcett & Walshawl d2007l) . 
we apply the POT method to the sunspot number time series 
without any de-clustering technique. The POT formalism also 
requires the underlying distribution of the random variables to 
be stationary (see, e.g. IColesT, 1200 ll) . The large amount of works 
that are successful in reproducing the time evolution of the 
International Sunspot Number using deterministic methods (e.g., 
I Verdes et all l2000t loikpati & Gilmanl 120061: IChoudhuri et all 
2007: lCameron & Schusslen, 120071) suggest that this is the case. 
Therefore, we can safely consider that the physics (probability 
distribution function of the random variables) driving the solar 
cycle does not vary with time. 

In this paper we only focus on the statistics of the upper tail 
of the distribution, i.e., maximal values of the sunspot number. 
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Fig. 2. Summary of the quality of the fit of the empirical cumulative distribution to a generalized Pareto distribution. The left panel 
shows the value of the cumulative distribution for different values of the sunspot number above the selected threshold of 149.4 (that 
leaves only 4% of the points on the time series with larger values) together with the GPD fit. The right panel shows the variation of 
the quality of the fit when the uncertainties in the cr and £ parameters of the GPD are taken into account. It is clear that the GPD 
seems to produce a good representation of the empirical cumulative distribution. 



Table 1. Parameters and return level estimates. The values are 
obtained for a threshold of u — 149.4. 



Parameter 


Estimate 


95% confidence interval 


cr 


27.157 


[23.525,31-745] 




-0.155 


[-0.251,0.035] 


— cr/f + u 


324.0 


[270.0, 990.0] 


1 1 -year return level 


187.8 


[180.2,203.0] 


100-year return level 


228.4 


[208.1,253.8] 



Although the application of the POT formalism to the analy- 
sis of the distribution of minimal values is also possible, larger 
time series are needed. We briefly discuss this issue in $4] In or- 
der to apply the POT formalism, a threshold u has to be fixed. 
The threshold has to be sufficiently large so that the generalized 
Pareto distribution is a suitable functional form for describing 
the tail of the cumulative distribution and it has to be sufficiently 
small so that enough values are available to give an accurate es- 
timation of the parameters of the GPD. There is not any known 
automatic procedure for the selection of the threshold. In this 
paper we choose a value of the threshold based on reasonable 
ideas and we verify which is the behavior of the parameters of 
the GPD for different values of the threshold. In our case, u has 
been chosen as the value that leaves 96% of the points of the time 
series below and only 4% of the points are considered as extreme 
values. For the dataset shown in Fig.Q] we find u = 149 .4. From 
the original set of 3096 data points, we leave 122 points above 
the threshold which are used to fit the GPD neglecting any time 
dependence. The empirical cumulative distribution function for 
points above the threshold is built and the values of <x and £ that 
give the best fit are obtained. 

The GPD parameters have been estimated maximizing the 
log-likelihood given by Eq. (0. As noted above, such an ap- 
proach permits to obtain the most probable values and their con- 
fidence intervals. In our case, we obtain cr = 27.157^-^2 and 
£ = -0.155*2™ as shown in TableQ] With these values, the en- 
suing cumulative distribution function is shown in the left panel 
of Fig. |2] where we show the value over the threshold on the hor- 
izontal axis and the value of the GPD on the vertical axis. The 



right panel of Fig. |2]shows the empirical cumulative distribution 
versus the fit F u (y) using the GPD. This is the so-called proba- 
bility plot, and it clearly indicates that the GPD produces a good 
approximation to the empirical cumulative distribution. Several 
interesting points deserve comment. Firstly, the confidence in- 
tervals presented in Table Q] are probably underestimating the 
true confidence intervals because we have neglected the tempo- 
ral dependence. Secondly, a negative value for the shape param- 
eter £ is found. The evidence for this is strong because the 95% 
confidence interval is almost exclusively in the negative domain. 
According to the results presented before, this suggests that the 
cumulative distribution is bounded and that zero probability is 
assigned to excedances above a certain limit, given by the ratio 
*\im — u — -cr/^. In our case, we find that the fitted GPD assigns 
zero probability to extreme values larger than 175.6 above the 
threshold. Taking into account the chosen threshold, this gives 
a limit of xmn = 324.0. This value is consistent with the data 
presented in Fig. Q] Note also that the 95% confidence interval 
is [270.0,990.0], which has been obtained from the likelihood 
function. It gives a very stringent value of the lower limit (as ob- 
vious because of the presence of a large amount of data below 
the threshold) but a very large upper limit. A 68% confidence 
interval is estimated to be [288,420]. This is a clear indicative 
of the asymmetry of the likelihood function with a very long 
tail towards larger x\i m , giving the idea that the information for 
a strong upper limit is hardly present in the data. However, it is 
important to take into account that these results can fluctuate de- 
pending on the exact value of the chosen threshold. Furthermore, 
it can also fluctuate in the future if the same calculation is car- 
ried out with more data from subsequent solar cycles. In order 
to analyze the strength of this conclusion, we have carried out 
the fit of the cumulative distribution for different values of the 
threshold. The results are shown in Fig.[3]for different values of 
the threshold u, the upper panel showing the values obtained for 
cr and the lower panel the values for If the GPD is a reason- 
able model for the excedances above a certain threshold mo, it 
shoul d remain rea sonable for larger values of the threshold (see, 
e.g. JColesl 1200 lb . As a consequence, the estimates of cr and £ 
should remain constant above uq provided uq is a valid thresh- 
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Fig. 4. Return value for a given amount of time (or return time 
for a given level) for the fit obtained with a threshold of u — 
149.4. The dashed lines indicate the confidence levels of the 
return value taking into account the uncertainty in the GPD 
parametrization. The vertical dotted line indicates 1 1 years, ap- 
proximately one solar cycle. The horizontal line indicates the 
value of the maximum found on solar cycle 19. 

old. We see that this happens in our case for u > 150, the value 
we have chosen above. It is interesting to note that for u ~ 120, 
approximately 9% of the points lie above the threshold, while 
for // ~ 180, less than 1% of the points lie above. There is a clear 
increase on the uncertainty of the retrieved parameters when the 
threshold increases due to the lack of points. What seems clear 
is that, for u < 150, the £ parameter has only negative values in- 
side the error bars. For larger values of the threshold, both neg- 
ative and positive values for £ are obtained, although negative 
values tend to be more probable. Focusing now on the threshold 
u = 149.4 and taking into account the uncertainty in the fitted 
parameters, we find that the upper limit of the tail can be found 
inside the interval (217,430), as indicated in TableQ] 

Once the cumulative distribution given by Eq. (|4]l is ob- 
tained, several statistical properties of the extreme values can 
be inferred. One of the most interesting in terms of prediction 
of future events are the so-called return times and return level. 
The return time is the typical time one has to wait until an event 
reaches and surpasses a threshold. Similarly, the return level is 
the typical extreme event one would find after waiting for a given 
amount of time. These quantities are obtained easily from Eq. © 
and they are shown in Fig. [4] For consistency, we only show the 
results for values above the chosen threshold. The dashed lines 
present the confidence interval that are induced by the uncer- 
tainty in the inferred parameters of the GPD. The vertical dot- 
ted line approximately indicates a solar cycle, of the order of 1 1 
years. For such amount of time, the return level equals to ~ 187. 
If we take into account the confidence interval, we find that the 
return level lies in the interval [180, 203]. These values appear to 
be consistent with the empirical results from the past, according 
to Fig. [T] This is produced by the similarity between the em- 
pirical tail of the extreme distribution and the GPD, that makes 
the statistical properties inferred from the GPD a good estima- 
tion of the statistical properties of the empirical distribution. If 
we assume that the previous 23 solar cycles are representative of 
the behavior of the Sun during a longer time and unless a long- 
period modification of the solar cycle exists, the previous estima- 
tions of the return levels are statistically significant. For the case 



of the return level for 100 years, we find a value of ~ 228, with a 
confidence interval of [208, 254]. Again, this value is consistent 
with the data. 

Concerning the return times, it is interesting to estimate them 
for the most extreme cases in the observed dataset. It is important 
to have in mind that relying on the GPD for very long-term ex- 
trapolation can be extremely dangerous. The dataset on which 
we have applied the extreme value theory spans only ~ 250 
years, so that one should not blindly rely on the return values 
for large events if they only happen once in the original data. 
For instance, according to the GPD, an event like the extremely 
strong peak of cycle 19 (the maximum of 253.8 took place dur- 
ing 1957) occurs once every ~ 700 years. Taking into account 
the uncertainty in the GPD fit, we find that it is possible to give 
only a lower limit to this return time because the upper limit 
is unbounded. The horizontal line indicates the maximum value 
of 253.8 obtained during cycle 19. The intersection of this hor- 
izontal line with the solid line given by the GPD extrapolation 
occurs at ~ 700 years, showing the most probable value of the 
return time. The intersection with the dashed lines indicate the 
intervals of 95% confidence. In this case, we find that the GPD 
extrapolation implies that such an extreme event happens, with 
95% probability, with a frequency above once every ~ 50 years, 
with an apparently unbounded upper limit. 

We want to leave a word of caution on the values obtained 
above because of their extrapolative character. However, we also 
want to stress the fact that this extrapolative character is based on 
strong theoretical roots. A small variation on the results obtained 
in this work might be expected as soon as more data showing 
more extreme events is available. This variation with respect to 
the values presented above should be very small if the underly- 
ing probability distribution function (essentially, the physics that 
drives the solar cycle) that we have calculated does not change 
appreciably with time. 

4. Concluding remarks 

We have presented an extreme value theory analysis of the so- 
lar cycle as measured by the sunspot number. The peak-over- 
threshold method has been applied. The analyzed time series 
presents a certain degree of correlation because a large num- 
ber of sunspots is usually followed by another large number of 
sunspots and thes e variables cannot be consid ered to be uncor- 
rected. Following lFawcett & Walshawl d2007l) . we have applied 
the POT method to the time series without any de-clustering 
technique. As a consequence, the confidence intervals presented 
in this work could be an underestimation of the real confidence 
intervals because the likelihood function is affected by the pres- 
ence of time correlation in the time series. Our results indicate 
that the distribution of extreme solar cycle events is bounded so 
that the value of 324 cannot be exceeded. The analysis of the 
confidence intervals give that, with 95% confidence, this maxi- 
mum value is larger than 288. Of more relevance are the 1 1-years 
and 100-years return values. The results indicate that there is a 
very high probability of finding values in the range [180,203] 
every solar cycle, and values in the range [208,254] every ~10 
solar cycles. Additionally, we have shown that an extreme event 
like that on solar cycle 19 (during 1957) occurs with a frequency 
above once every ~50 years. 

The results obtained in this paper are based on the statisti- 
cal analysis of the tail of the sunspot number distribution. This 
analysis is driven by the extreme value theory that is based on 
strong theoretical roots developed during the last 50 years. Such 
application relies on the assumption that the limiting behavior 
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of the stochastic process behind the observed time series can 
be obtained from the application of certain mathematical lim- 
its. Then, a functional form for the tails of distributions is avail- 
able and we only have to fit this tail distribution to our dataset. 
However, this approach has limitations. One of the strongest is 
that it is not yet clear whether the limiting mathematical models 
that we have used can be directly applied to finite time series. 
It is expected that, in the limit of an infinitely large time se- 
ries, the models correctly reproduce the tails of the underlying 
distribution. However, when the time series is of limited size, 
fluctuations can be of importance and lead to inaccuracies. In 
our case, the results that we present appear to support the fact 
that the statistics of extreme events are correctly reproduced un- 
der the framework of the extreme value theory. The fact that the 
theory explains the already observed extreme events is also fa- 
vorable. The validity of the theory is also supported by the large 
amount of practical applications of the theory that we found in 
the literature. 

The extreme value theory can also be applied to minima. One 
of the most interesting future applications of this approach is to 
estimate the return time for long low-activity periods, with the 
aim of estimating the frequency with which extreme events like 
the Maunder minimum may take place. A much longer time se- 
ries of solar activity would be needed for such an estimation. 
Furthermore, an appropriate re-definition of the time series has 
to be carried out in order to transform these low-activity peri- 
ods into extremes cases. A possibility that could be of interest is 
to use the solar sunspot num ber reconstructed from 14 C activity 
during the last 11000 years (So lanki et all 120041) . This recon- 
struction shows periods of low activity similar to the Maunder 
minimum whose probability (and ensuing return time) could be 
calculated in the framework of the extreme value theory. 
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